An Optimization Text Summarization Method Based on Naïve Bayes and Topic Word for Single Syllable Language

نویسندگان

  • Ha Nguyen
  • Thi Thu
چکیده

Text summarization since the late 50’s of the 20th century by the simple technical based on term frequency and it applied for technical text summarization at IBM institute. During more than 50 years of development, text summarization is still a hot topic that attracting many researchers, scholars in the field of data mining and natural language processing proposals development of the text summarization system. For the English, there are some automatic text summary systems was built as SUMARIST, SWESUM,... But for single syllable languages like Chinese, Vietnamese, Japanese, Thai, Mongolian and other "native" languages in Southeast and East Asia. Amount people use single syllable language more than 60% of all language on the world. So that, processing of single syllable language is very necessary. However, it is very complex for language processing problem because it’s very hard to determine word or term based on white space and all word segmentation tools not reach 100% accuracy currently. In this paper, we propose a text summarization method based on Naïve Bayes algorithm and topic words set. We’ve experimented with 320 Vietnamese texts (equivalent to 11,670 Vietnamese sentences) show that our method is really effective; text summary is readable, understandable and closer with summary of the human.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Combining Prediction by Partial Matching and Logistic Regression for Thai Word Segmentation

Word segmentation is an important part of many applications, including information retrieval, information filtering, document analysis, and text summarization. In Thai language, the process is complicated since words are written continuously, and their structures are not well-defined. A recognized effective approach to word segmentation is Longest Matching, a method based on dictionary. Neverth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014